WTF-LOD - A New Resource for Large-Scale NER Evaluation
نویسندگان
چکیده
This paper introduces the Web TextFull linkage to Linked Open Data (WTF-LOD) dataset intended for large-scale evaluation of named entity recognition (NER) systems. First, we present the process of collecting data from the largest publically-available textual corpora, including Wikipedia dumps, monthly runs of the CommonCrawl, and ClueWeb09/12. We discuss similarities and differences of related initiatives such as WikiLinks and WikiReverse. Our work primarily focuses on links from “textfull” documents (links surrounded by a text that provides a useful context for entity linking), de-duplication of the data and advanced cleaning procedures. Presented statistics demonstrate that the collected data forms one of the largest available resource of its kind. They also prove suitability of the result for complex NER evaluation campaigns, including an analysis of the most ambiguous name mentions appearing in the data.
منابع مشابه
Resource Based View: A Promising New Theory for Healthcare Organizations; Comment on “Resource Based View of the Firm as a Theoretical Lens on the Organisational Consequences of Quality Improvement”
This commentary reviews a recent piece by Burton and Rycroft-Malone on the use of Resource Based View (RBV) in healthcare organizations. It first outlines the core content of their piece. It then discusses their attempts to extend RBV to the analysis of large scale quality improvement efforts in healthcare. Some critique is elaborated. The broader question of why RBV seems to be migrating into ...
متن کاملFrank : Frank: The LOD Cloud at Your Fingertips
Large-scale, algorithmic access to LOD Cloud data has been hampered by the absence of queryable endpoints for many datasets, a plethora of serialization formats, and an abundance of idiosyncrasies such as syntax errors. As of late, very large-scale — hundreds of thousands of document, tens of billions of triples — access to RDF data has become possible thanks to the LOD Laundromat Web Service. ...
متن کاملConcurrent control on resource planning and revenue/expenditure estimation in large-scale shell material embankment projects management using discrete-event simulation
Resource planning in large-scale construction projects has been a complicated management issue requiring mechanisms to facilitate decision making for managers. In the present study, a computer-aided simulation model is developed based on concurrent control of resources and revenue/expenditure. The proposed method responds to the demand of resource management and scheduling in shell material emb...
متن کاملThe Design and Implementation of the Wave Transactional Filesystem
This paper introduces the Wave Transactional Filesystem (WTF), a novel, transactional, POSIX-compatible filesystem based on a new file slicing API that enables efficient file transformations. WTF provides transactional access to a distributed filesystem, eliminating the possibility of inconsistencies across multiple files. Further, the file slicing API enables applications to construct files fr...
متن کاملFrank: Algorithmic Access to the LOD Cloud
Large-scale, algorithmic access to LOD Cloud data has been hampered by the absence of queryable endpoints for many datasets, a plethora of serialization formats, and an abundance of idiosyncrasies such as syntax errors. As of late, very large-scale – hundreds of thousands of document, tens of billions of triples – access to RDF data has become possible thanks to the LOD Laundromat Web Service. ...
متن کامل